Goto

Collaborating Authors

 cartesian product



A Problem Formulation using L1 and L

Neural Information Processing Systems

Proof of Lemma 2. Let U be the data set associated to ν. Proof of Lemma 3. First, we prove that the property holds for the root node. We wish to prove the property for some unexplored leaf after the iteration. This is trivial if the leaf ν is not expanded in that iteration. Suppose the leaf ν is expanded. Proof of Lemma 5. From Lemma 2, we note that Q Consider any path from the root to a leaf whose length is mK for some integer K > 0. We note that for each node ν and any of its children ν (Lemma 5).





Canonical Representations of Markovian Structural Causal Models: A Framework for Counterfactual Reasoning

arXiv.org Artificial Intelligence

Counterfactual reasoning aims at answering contrary-to-fact questions like "Would have Alice recovered had she taken aspirin?" and corresponds to the most fine-grained layer of causation. Critically, while many counterfactual statements cannot be falsified--even by randomized experiments--they underpin fundamental concepts like individual-wise fairness. Therefore, providing models to formalize and implement counterfactual beliefs remains a fundamental scientific problem. In the Markovian setting of Pearl's causal framework, we propose an alternative approach to structural causal models to represent counterfactuals compatible with a given causal graphical model. More precisely, we introduce counterfactual models, also called canonical representations of structural causal models. They enable analysts to choose a counterfactual assumption via random-process probability distributions with preassigned marginals and characterize the counterfactual equivalence class of structural causal models. Using these representations, we present a normalization procedure to disentangle the (arbitrary and unfalsifiable) counterfactual choice from the (typically testable) interventional constraints. In contrast to structural causal models, this allows to implement many counterfactual assumptions while preserving interventional knowledge, and does not require any estimation step at the individual-counterfactual layer: only to make a choice. Finally, we illustrate the specific role of counterfactuals in causality and the benefits of our approach on theoretical and numerical examples.





Zero Inflation as a Missing Data Problem: a Proxy-based Approach

arXiv.org Artificial Intelligence

A common type of zero-inflated data has certain true values incorrectly replaced by zeros due to data recording conventions (rare outcomes assumed to be absent) or details of data recording equipment (e.g. artificial zeros in gene expression data). Existing methods for zero-inflated data either fit the observed data likelihood via parametric mixture models that explicitly represent excess zeros, or aim to replace excess zeros by imputed values. If the goal of the analysis relies on knowing true data realizations, a particular challenge with zero-inflated data is identifiability, since it is difficult to correctly determine which observed zeros are real and which are inflated. This paper views zero-inflated data as a general type of missing data problem, where the observability indicator for a potentially censored variable is itself unobserved whenever a zero is recorded. We show that, without additional assumptions, target parameters involving a zero-inflated variable are not identified. However, if a proxy of the missingness indicator is observed, a modification of the effect restoration approach of Kuroki and Pearl allows identification and estimation, given the proxy-indicator relationship is known. If this relationship is unknown, our approach yields a partial identification strategy for sensitivity analysis. Specifically, we show that only certain proxy-indicator relationships are compatible with the observed data distribution. We give an analytic bound for this relationship in cases with a categorical outcome, which is sharp in certain models. For more complex cases, sharp numerical bounds may be computed using methods in Duarte et al.[2023]. We illustrate our method via simulation studies and a data application on central line-associated bloodstream infections (CLABSIs).